Production models as a structural basis for automatic speech recognition

نویسندگان

Li Deng

Gordon Ramsay

Don X. Sun

چکیده

We postulate in this paper that highly structured speech production models will have much to contribute to the ultimate success of speech recognition in view of the weaknesses of the theoretical foundation underpinning current technology. These weaknesses are analyzed in terms of phonological modeling and of phonetic-interface modeling. We present two probabilistic speech recognition models with the structure designed based on approximations to human speech production mechanisms, and conclude by suggesting that many of the advantages to be gained from interaction between speech production and speech recognition communities will develop from integrating production models with the probabilistic analysis-by-synthesis strategy currently used by the technology community. 0 1997 Elsevier Science B.V. R&urn6 Dans cet article, nous suggerons clue des modules de production de la parole fortement structures pourront contribuer significativement ?I la r&site future des mod&les de reconnaissance automatique de la parole, limit& en ce moment par les faiblesses de la base theorique de la technologie actuelle. Nous analysons ces faiblesses au niveau des modbles phonologiques et des modeles phonetiques, et presentons deux modules statistiques de reconnaissance de la parole bases sur des approximations des mCcanismes de production de la parole. Nous suggerons en conclusion que l'interaction entre les domaines de la production et de la reconnaissance de la parole peut Ctre particulierement efficace si l'on integre les modules de production dans la strategic d'analyse-synthbse probabiliste, utilisee deja depuis longtemps en reconnaissance de la parole. 0 1997 Elsevier Science B.V.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

For many years, speech has been the most natural and efficient means of information exchange for human beings. With the advancement of technology and the prevalence of computer usage, the design and production of speech recognition systems have been considered by researchers. Among this, lip-reading techniques encountered with many challenges for speech recognition, that one of the challenges b...

متن کامل

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Abstract Recent developments in robotics automation have motivated researchers to improve the efficiency of interactive systems by making a natural man-machine interaction. Since speech is the most popular method of communication, recognizing human emotions from speech signal becomes a challenging research topic known as Speech Emotion Recognition (SER). In this study, we propose a Persian em...

متن کامل

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

In order to facilitate the entry of data into the computer and its digitalization, automatic recognition of printed texts and manuscripts is one of the considerable aid to many applications. Research on automatic document recognition started decades ago with the recognition of isolated digits and letters, and today, due to advancements in machine learning methods, efforts are being made to iden...

متن کامل

Allophone-based acoustic modeling for Persian phoneme recognition

Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...

متن کامل

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

Automatic recognition of speech emotional states in noisy conditions has become an important research topic in the emotional speech recognition area, in recent years. This paper considers the recognition of emotional states via speech in real environments. For this task, we employ the power normalized cepstral coefficients (PNCC) in a speech emotion recognition system. We investigate its perfor...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

Speech Communication

دوره 22 شماره

صفحات -

تاریخ انتشار 1997

Production models as a structural basis for automatic speech recognition

نویسندگان

چکیده

منابع مشابه

Designing and implementing a system for Automatic recognition of Persian letters by Lip-reading using image processing methods

A Database for Automatic Persian Speech Emotion Recognition: Collection, Processing and Evaluation

Off-line Arabic Handwritten Recognition Using a Novel Hybrid HMM-DNN Model

Allophone-based acoustic modeling for Persian phoneme recognition

Speech Emotion Recognition Based on Power Normalized Cepstral Coefficients in Noisy Conditions

عنوان ژورنال:

اشتراک گذاری